Authors: Mauro Venticinque, Angelo Schillaci, Daniele Tambone

GitHub project: Bank-Marketing

Date: 2025-03-24

Introduction

Here we will write some information about the project.

1 Exploratory Data Analysis

datatable(head(train, 100), options = list(scrollX = TRUE))
str(train)
## 'data.frame':    32950 obs. of  22 variables:
##  $ X             : int  35248 39854 14530 27822 40199 21227 16836 39099 38565 38152 ...
##  $ age           : int  30 39 43 27 56 41 57 46 61 35 ...
##  $ job           : chr  "blue-collar" "technician" "services" "student" ...
##  $ marital       : chr  "married" "married" "single" "single" ...
##  $ education     : chr  "professional.course" "university.degree" "high.school" "high.school" ...
##  $ default       : chr  "no" "no" "no" "no" ...
##  $ housing       : chr  "no" "yes" "no" "yes" ...
##  $ loan          : chr  "no" "no" "no" "no" ...
##  $ contact       : chr  "cellular" "cellular" "cellular" "cellular" ...
##  $ month         : chr  "may" "jun" "jul" "mar" ...
##  $ day_of_week   : chr  "fri" "mon" "tue" "thu" ...
##  $ duration      : int  1357 713 1317 80 230 697 1441 679 106 234 ...
##  $ campaign      : int  4 2 4 4 2 2 2 1 2 1 ...
##  $ pdays         : int  999 999 999 999 999 999 999 999 999 999 ...
##  $ previous      : int  1 0 0 0 1 0 0 0 1 0 ...
##  $ poutcome      : chr  "failure" "nonexistent" "nonexistent" "nonexistent" ...
##  $ emp.var.rate  : num  -1.8 -1.7 1.4 -1.8 -1.7 1.4 1.4 -3 -3.4 -3.4 ...
##  $ cons.price.idx: num  92.9 94.1 93.9 92.8 94.2 ...
##  $ cons.conf.idx : num  -46.2 -39.8 -42.7 -50 -40.3 -36.1 -42.7 -33 -26.9 -29.8 ...
##  $ euribor3m     : num  1.25 0.72 4.96 1.65 0.87 ...
##  $ nr.employed   : num  5099 4992 5228 5099 4992 ...
##  $ subscribed    : chr  "yes" "yes" "yes" "yes" ...
attach(train)

1.1 Variable descriptions

1.1.1 Bank client data:

  1. X (Integer): average yearly balance
  2. age (Integer): age of the customer
  3. job (Categorical): occupation
  4. marital (Categorical): marital status
  5. education (Categorical): education level
  6. default (Binary): has credit in default?
  7. housing (Binary): has housing loan?
  8. loan (Binary): has personal loan?
  9. contact (Categorical): contact communication type
  10. month (Categorical): last contact month of year
  11. day_of_week (Integer): last contact day of the week
  12. duration (Integer): last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y=‘no’). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model

1.1.2 Other attributes:

  1. campaign (Integer): number of contacts performed during this campaign and for this client (numeric, includes last contact)
  2. pdays (Integer): number of days that passed by after the client was last contacted from a previous campaign (numeric; -1 means client was not previously contacted)
  3. previous (Integer): number of contacts performed before this campaign and for this client
  4. poutcome (Categorical): outcome of the previous marketing campaign (categorical: ‘failure’,‘nonexistent’,‘success’)

1.1.3 Social and economic context attributes

  1. emp.var.rate (Integer): employment variation rate - quarterly indicator
  2. cons.price.idx (Integer): consumer price index - monthly indicator
  3. cons.conf.idx (Integer): consumer confidence index - monthly indicator
  4. euribor3m (Integer): euribor 3 month rate - daily indicator
  5. nr.employed (Integer): number of employees - quarterly indicator

1.1.4 Output variable (desired target)

  1. subscribed (Binary): has the client subscribed a term deposit?

Source: UCI Machine Learning Repository

vis_dat(train)

corrplot(cor(train[, c("X", "age", "duration", "campaign", "pdays", "previous", "emp.var.rate", "cons.price.idx", "cons.conf.idx", "euribor3m", "nr.employed")]), method="pie")

plot_ly(train, x = job, y = age, type = 'box', color = job)
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
plot_ly(train, x = subscribed, type = 'histogram')